πŸ•ΈοΈ Ada Research Browser

README.md
← Back

cmmc-ssp-autogen-saas

AI-powered SaaS that ingests PDF/DOCX, auto-maps content to CMMC Level 2 controls, and generates audit-ready System Security Plans. Secure multitenant architecture with role-based access, dashboards, and automated compliance scoring. Built for Defense Industrial Base readiness. mvp-cmmc-ssp/ β”œβ”€ README.md β”œβ”€ backend/ β”‚ β”œβ”€ app/ β”‚ β”‚ β”œβ”€ main.py β”‚ β”‚ β”œβ”€ auth.py β”‚ β”‚ β”œβ”€ models.py β”‚ β”‚ β”œβ”€ storage.py β”‚ β”‚ β”œβ”€ processor.py β”‚ β”‚ β”œβ”€ mapping.py β”‚ β”‚ β”œβ”€ ssp_generator.py β”‚ β”‚ └─ deps.py β”‚ β”œβ”€ Dockerfile β”‚ └─ requirements.txt β”œβ”€ frontend/ β”‚ β”œβ”€ package.json β”‚ └─ src/ β”‚ β”œβ”€ App.jsx β”‚ β”œβ”€ components/Upload.jsx β”‚ β”œβ”€ components/Dashboard.jsx β”‚ └─ services/ws.js β”œβ”€ infra/ β”‚ β”œβ”€ docker-compose.yml β”‚ β”œβ”€ terraform/ β”‚ β”‚ β”œβ”€ main.tf β”‚ β”‚ └─ providers.tf β”‚ └─ zap_scan.sh β”œβ”€ docs/ β”‚ β”œβ”€ cmmc_controls.json β”‚ └─ mapping_template.md β”œβ”€ tests/ β”‚ β”œβ”€ test_processor.py β”‚ └─ test_auth.py └─ scripts/ └─ local_start.sh

MVP: CMMC Level 2 SSP Generator (Production-minded demo)

Purpose: Demo-ready SaaS MVP for parsing DOCX/PDF, mapping extracted content to NIST/SP800-171 (CMMC L2) controls, and producing audit-ready SSP/PDF outputs. Includes a React front-end with real-time dashboards.

Important references: - NIST SP 800-171 Rev.2 (control set used). See NIST. :contentReference[oaicite:3]{index=3} - CMMC Level 2 aligns to the 110 controls in NIST SP 800-171. :contentReference[oaicite:4]{index=4}

Run locally (dev): 1. copy .env.template β†’ .env and supply secrets (AWS S3, JWT secret, LLM api key). 2. ./scripts/local_start.sh 3. Frontend: http://localhost:3000 ; Backend: http://localhost:8000

Acceptance test: - Upload sample DOCX/PDF β†’ check /_status WebSocket progress β†’ download generated SSP.docx and SSP.pdf.

Security/hardening checklist (must be completed before production): - HSM-backed key management (AWS KMS with GovCloud keys or dedicated HSM) - Replace simple JWT with short-lived access tokens + refresh & session revocation - Pen test & full OWASP ZAP scan (script provided). Ensure no critical findings. - Host in GovCloud with strictly controlled IAM roles and VPC endpoints.

CI / ZAP scan script (infra/zap_scan.sh)

!/usr/bin/env bash

simple OWASP ZAP baseline scan for local deployment

docker run -t owasp/zap2docker-stable zap-baseline.py -t http://host.docker.internal:8000 -r zap_report.html

parse report, fail if critical findings exist β€” implement policy in CI

Implementation notes & production hardening (you must do these)

  1. Tenant isolation: current JWT contains tenant_id. Enforce DB row-level tenant scoping for every query. Consider separate S3 prefixes + encryption keys per tenant, and use IAM policies limiting access to those prefixes.

  2. KMS / HSM: Replace JWT_SECRET with KMS-signed tokens and use AWS KMS for all encryption keys. Audit key usage.

  3. LLM & embeddings: Current mapping uses local sentence-transformers. For higher accuracy and scale, swap embedding calls to an enterprise LLM or hosted vector DB (Pinecone / Milvus) and optionally fine-tune the model on SSP/POA&M examples. Keep the raw documents encrypted at rest; do LLM requests via VPC endpoints if using cloud LLM.

  4. Evidence chain & explainability: Save chunk offsets and original text excerpts as evidence. Store hashes of original docs in manifest (for non-repudiation).

  5. SSP formatting: The generator creates a clean DOCX; for auditor-ready PDF, convert via WeasyPrint or a signed PDF pipeline and apply watermarking and audit page.

  6. Audit logging & monitoring: All processing steps must write immutable audit events to an append-only store (CloudWatch Logs with KMS, or Splunk). Ensure retention & rotation policies meet DFARS contract requirements.

  7. Pen test & SAST/DAST: Run OWASP ZAP and fix criticals; performer full code review for sensitive endpoints. Acceptance criteria includes zero critical ZAP findings.

  8. CI/CD: Terraform plan/apply in GovCloud using locked-down service principals, remote state in secure S3 with DynamoDB locking. Consider ephemeral build agents inside GovCloud for end-to-end compliance. Amazon Web Services, Inc. +1

Where the repo intentionally leaves choices for you (and why)

-LLM provider: For DoD workflow you might prefer an on-prem or FedRAMP-authorized LLM endpoint. I kept model calls local (sentence-transformers) for reproducible demo without exposing secrets. Swap to OpenAI/Anthropic with private endpoints or an on-prem model for FedRAMP compliance.

-Vector DB: FAISS works for MVP. For multi-tenant scale use Pinecone, Milvus, or an RDS-backed vector store inside GovCloud.

-Fine-tuning: If you want high accuracy (>95% control coverage as acceptance), you’ll almost certainly need supervised fine-tuning using labeled SSPs and evidence. The code contains CMMCMapper hook points for plugging in a fine-tuned model.

Final checklist to finish before you label this β€œproduction-ready” (do not skip)

-Populate docs/cmmc_controls.json with all 110 controls (use the NIST doc). NIST Publications

-Implement persistent manifest DB (Postgres with RLS for tenant isolation).

-Integrate KMS + rotate keys.

-Replace dev JWT flow with short-lived tokens + refresh + device binding.

-Configure VPC-only access for LLM provider and S3 with VPC endpoints.

-Audit CI/CD and Terraform flow for GovCloud: require manual approvals for production apply. Amazon Web Services, Inc. +1

Delivery & provenance

To get started immediately:

  1. Create repo and paste the files above (or I can produce each file in full if you want one giant paste).

  2. Populate .env with S3 creds and JWT secret for local dev.

  3. Run ./scripts/local_start.sh (script spins up uvicorn and vite).

  4. Test upload β†’ watch WebSocket progress β†’ download SSP.docx.